AITopics | quantization bit

Collaborating Authors

quantization bit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hierarchical Channel-spatial Encoding for Communication-efficient Collaborative Learning

Neural Information Processing SystemsApr-25-2026, 03:47:17 GMT

It witnesses that the collaborative learning (CL) systems often face the performance bottleneck of limited bandwidth, where multiple low-end devices continuously generate data and transmit intermediate features to the cloud for incremental training. To this end, improving the communication efficiency by reducing traffic size is one of the most crucial issues for realistic deployment. Existing systems mostly compress features at pixel level and ignore the characteristics of feature structure, which could be further exploited for more efficient compression. In this paper, we take new insights into implementing scalable CL systems through a hierarchical compression on features, termed Stripe-wise Group Quantization (SGQ). Different from previous unstructured quantization methods, SGQ captures both channel and spatial similarity in pixels, and simultaneously encodes features in these two levels to gain a much higher compression ratio. In particular, we refactor feature structure based on inter-channel similarity and bound the gradient deviation caused by quantization, in forward and backward passes, respectively. Such a double-stage pipeline makes SGQ hold a sublinear convergence order as the vanilla SGD-based optimization. Extensive experiments show that SGQ achieves a higher traffic reduction ratio by up to 15.97 and provides 9.22 image processing speedup over the uniform quantized training, while preserving adequate model accuracy as FP32 does, even using 4-bit quantization. This verifies that SGQ can be applied to a wide spectrum of edge intelligence applications.

artificial intelligence, feature map, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China (0.46)
North America > United States (0.28)

Genre: Research Report (0.67)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

HierarchicalChannel-spatialEncodingfor Communication-efficientCollaborativeLearning

Neural Information Processing SystemsFeb-18-2026, 23:37:26 GMT

Existing systems mostly compress features at pixel level and ignore the characteristics of feature structure, which could be further exploited for more efficient compression.

artificial intelligence, machine learning, quantization, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.05)
Africa > Ethiopia (0.04)
Europe > United Kingdom (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

HierarchicalChannel-spatialEncodingfor Communication-efficientCollaborativeLearning

Neural Information Processing SystemsFeb-18-2026, 23:37:22 GMT

Existing systems mostly compress features at pixel level and ignore the characteristics of feature structure, which could be further exploited for more efficient compression.

artificial intelligence, machine learning, quantization, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.05)
Africa > Ethiopia (0.05)
Europe > United Kingdom (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

0e230b1a582d76526b7ad7fc62ae937d-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 11:39:13 GMT

flexor, quantization, tanh, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

0e230b1a582d76526b7ad7fc62ae937d-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-7-2026, 11:39:01 GMT

More extensive and thorough experiments are needed. Sub 1-bit quantization is only available through FleXOR. Or do some weights use >1b while other can use much less? The reviewer did not find results in the paper that used quantized inputs. "Input weight format" should read "Internal weight format."

1-bit quantization, artificial intelligence, quantization, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.33)

Add feedback

FleXOR: Trainable Fractional Quantization Dongsoo Lee Se Jung Kwon

Neural Information Processing SystemsOct-2-2025, 01:17:31 GMT

Quantization based on the binary codes is gaining attention because each quantized bit can be directly utilized for computations without dequantization using look-up tables.

artificial intelligence, machine learning, quantization, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

0e230b1a582d76526b7ad7fc62ae937d-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 01:17:21 GMT

1-bit quantization, artificial intelligence, quantization, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.33)

Add feedback

Review for NeurIPS paper: HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks

Neural Information Processing SystemsAug-16-2025, 16:29:59 GMT

Summary and Contributions: This paper suggests that Hessian trace can be a good metric to automate the process to decide the number of quantization bits for each layer unlike previous attempts such as using top Hessian eigenvalue. Some mathematical analysis to support that Hessian trace is better than top Hessian eigenvalue is provided while memory footprint and mode accuracy are compared on several models using ImageNet database. This paper also shows that Hessian trace computations can be simplified by following the Hutchinson's algorithm. Strengths: - Hessian-related metrics have been widely adopted to present different sensitivity of layers. This paper compares a few different Hessian-related approaches and provides some mathematical analysis to claim why Hessian trace can be considered as a good metric to produce some optimal number of quantization bits.

hessian aware trace-weighted quantization, hessian trace, top hessian eigenvalue, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.41)

Add feedback

Deploying Large AI Models on Resource-Limited Devices with Split Federated Learning

Qiang, Xianke, Liu, Hongda, Zhang, Xinran, Chang, Zheng, Liang, Ying-Chang

arXiv.org Artificial IntelligenceApr-15-2025

Abstract--Large Artificial Intelligence Models (LAMs) powered by massive datasets, extensive parameter scales, and extensive computational resources, leading to significant transformations across various industries. Y et, their practical deployment on resource-limited mobile edge devices is hindered by critical challenges such as data privacy, constrained resources, and high overhead costs. Addressing this gap, this paper proposes a novel framework, named Quantized Split Federated Fine-T uning Large AI Model (SFLAM). By partitioning the training load between edge devices and servers using a split learning paradigm, SFLAM can facilitate the operation of large models on devices and significantly lowers the memory requirements on edge devices. Additionally, SFLAM incorporates quantization management, power control, and bandwidth allocation strategies to enhance training efficiency while concurrently reducing energy consumption and communication latency. A theoretical analysis exploring the latency-energy trade-off is presented, and the framework's efficacy is validated via comprehensive simulations. The findings indicate that SFLAM achieves superior performance in terms of learning efficiency and scalability compared to conventional methods, thereby providing a valuable approach for enabling advanced AI services in resource-constrained scenarios. I. Introduction The advent of Large AI Models (LAMs), such as Chat-GPT and DeepSeek, marked a significant leap in AI capabilities, powered by their extensive parameter scales, large-scale datasets, and substantial computational resources [1]. As user demand for ubiquitous AI access and real-time, personalized experiences grows, deploying and training these models on mobile devices becomes increasingly relevant [2]. T o meet these escalating demands, fine-tuning, which involves adapting pre-trained models with domain-specific data, has become a widely adopted and efficient strategy for enhancing LAM performance on specialized tasks, offering a cost-effective path to superior results.

energy consumption, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2504.09114

Country:

Asia > China (0.28)
North America (0.28)
Europe (0.28)

Genre: Research Report (1.00)

Industry:

Energy (0.88)
Information Technology > Security & Privacy (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.86)

Add feedback

Robust Iterative Value Conversion: Deep Reinforcement Learning for Neurochip-driven Edge Robots

Kadokawa, Yuki, Kodera, Tomohito, Tsurumine, Yoshihisa, Nishimura, Shinya, Matsubara, Takamitsu

arXiv.org Artificial IntelligenceAug-23-2024

A neurochip is a device that reproduces the signal processing mechanisms of brain neurons and calculates Spiking Neural Networks (SNNs) with low power consumption and at high speed. Thus, neurochips are attracting attention from edge robot applications, which suffer from limited battery capacity. This paper aims to achieve deep reinforcement learning (DRL) that acquires SNN policies suitable for neurochip implementation. Since DRL requires a complex function approximation, we focus on conversion techniques from Floating Point NN (FPNN) because it is one of the most feasible SNN techniques. However, DRL requires conversions to SNNs for every policy update to collect the learning samples for a DRL-learning cycle, which updates the FPNN policy and collects the SNN policy samples. Accumulative conversion errors can significantly degrade the performance of the SNN policies. We propose Robust Iterative Value Conversion (RIVC) as a DRL that incorporates conversion error reduction and robustness to conversion errors. To reduce them, FPNN is optimized with the same number of quantization bits as an SNN. The FPNN output is not significantly changed by quantization. To robustify the conversion error, an FPNN policy that is applied with quantization is updated to increase the gap between the probability of selecting the optimal action and other actions. This step prevents unexpected replacements of the policy's optimal actions. We verified RIVC's effectiveness on a neurochip-driven robot. The results showed that RIVC consumed 1/15 times less power and increased the calculation speed by five times more than an edge CPU (quad-core ARM Cortex-A72). The previous framework with no countermeasures against conversion errors failed to train the policies. Videos from our experiments are available: https://youtu.be/Q5Z0-BvK1Tc.

conversion error, neurochip, snn policy, (13 more...)

arXiv.org Artificial Intelligence

2408.13018

Country: